Crochemore's String Matching Algorithm: Simplification, Extensions, Applications
نویسندگان
چکیده
We address the problem of string matching in the special case where the pattern is very long. First, constant extra space algorithms are desirable with long patterns, and we describe a simplified version of Crochemore’s algorithm retaining its linear time complexity and constant extra space usage. Second, long patterns are unlikely to occur in the text at all. Thus we define a generalization of string matching called Longest Prefix Matching that asks for the occurrences of the longest prefix of the pattern occurring in the text at least once, and modify the simplified Crochemore’s algorithm to solve this problem. Finally, we define and solve the problem of Sparse Longest Prefix Matching that is useful when the pattern has to be split into multiple pieces because it is too long to be processed in one piece. These problems are motivated by and have application in Lempel-Ziv (LZ77) factorization.
منابع مشابه
Fast Multiple String Matching Using Streaming SIMD Extensions Technology
Searching for all occurrences of a given set of patterns in a text is a fundamental problem in computer science with applications in many fields, like computational biology and intrusion detection systems. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. This study introdu...
متن کاملTowards a Very Fast Multiple String Matching Algorithm for Short Patterns
Multiple exact string matching is one of the fundamental problems in computer science and finds applications in many other fields, among which computational biology and intrusion detection. It turns out that short patterns appear in many instances of such problems and, in most cases, sensibly affect the performances of the algorithms. Recent solutions in the field of string matching try to expl...
متن کاملFast Packed String Matching for Short Patterns
Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In ...
متن کاملString Matching Application for Network Security
String matching is one of the key of network security, biological applications and many areas are benefited from a faster string matching algorithm. The effectiveness and efficiency of string matching algorithms is important for applications like as network intrusion detection systems, virus detection, medical science and web content filters system. This paper reviews what works has been done i...
متن کاملA Quick String Matching Employing Mixing Up
Most of the current string matching algorithms behave slowly when the amount of patterns increases. In this paper a fast matching algorithm named SSEMatch was designed. PHADDW instruction from SSE (Streamed SIMD Extension) set was used in SSEMatch to produce data confusion, by which the patterns can be distributed into pseudo hash address such that there will be less patterns left for verificat...
متن کامل